{"nbformat":4,"nbformat_minor":0,"metadata":{"anaconda-cloud":{},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.5.2"},"colab":{"name":"Tutorial V.ipynb","provenance":[],"collapsed_sections":["r-csPGkyt2NU","6rPsRTmIt2NR","KMoIcgBzt2NZ","RAJ6Lsurt2Ne","57AZ1Nzat2Nm","CNbOLvQYt2Nr","pwH_H7b6t2N1","Wh0y_Hr4t2N5","6Rc9mTd-t2OM","2K71p8F0t2OS","MN0LvX7dt2Of","Bj0ra0Cpt2Og","1u0_qadxt2Ok","q5x83mdht2Ol","FxfnvGort2Om","lwcVPGjtt2Oo"],"toc_visible":true},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"id":"TXHZ24Eyt2NN","colab_type":"text"},"source":["# Tutorial V: Deep models"]},{"cell_type":"markdown","metadata":{"id":"_Fcuv4BIt2NP","colab_type":"text"},"source":["<p>\n","Bern Winter School on Machine Learning, 27-31 January 2020<br>\n","Prepared by Mykhailo Vladymyrov.\n","</p>\n","\n","This work is licensed under a <a href=\"http://creativecommons.org/licenses/by-nc-sa/4.0/\">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>."]},{"cell_type":"markdown","metadata":{"id":"8sVP5taKt2NQ","colab_type":"text"},"source":["In this session we will use the pretrained Inception model to build own image classifier. We will aslo learn how to save our trained models."]},{"cell_type":"markdown","metadata":{"id":"r-csPGkyt2NU","colab_type":"text"},"source":["## 1. Load necessary libraries"]},{"cell_type":"code","metadata":{"id":"awNPpOj9CF6O","colab_type":"code","colab":{}},"source":["# if using google colab\n","%tensorflow_version 2.x"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"ja1-byr4t2NV","colab_type":"code","colab":{}},"source":["import sys\n","import os\n","\n","import numpy as np\n","import matplotlib.pyplot as plt\n","import IPython.display as ipyd\n","import tensorflow.compat.v1 as tf\n","tf.disable_v2_behavior()\n","from PIL import Image\n","\n","# We'll tell matplotlib to inline any drawn figures like so:\n","%matplotlib inline\n","plt.style.use('ggplot')\n","\n","from IPython.core.display import HTML\n","HTML(\"\"\"<style> .rendered_html code { \n","    padding: 2px 5px;\n","    color: #0000aa;\n","    background-color: #cccccc;\n","} </style>\"\"\")"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"6rPsRTmIt2NR","colab_type":"text"},"source":["### Download libraries"]},{"cell_type":"code","metadata":{"id":"qVXZj9jAt2NS","colab_type":"code","colab":{}},"source":["p = tf.keras.utils.get_file('./material.tgz', 'https://scits-training.unibe.ch/data/tut_files/material.tgz')\n","!mv {p} .\n","!tar -xvzf material.tgz > /dev/null  2>&1"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"WgSm_C4zCi_X","colab_type":"code","colab":{}},"source":["from utils import gr_disp\n","from utils import inception"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"qNo4F8H4t2NX","colab_type":"code","colab":{}},"source":["def tfSessionLimited(graph=None):\n","    session_config=tf.ConfigProto( gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.85))\n","    session_config.gpu_options.visible_device_list = str(0) #use 1st gpu\n","    return tf.Session(graph=graph, config=session_config)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"KMoIcgBzt2NZ","colab_type":"text"},"source":["## 2. Load the model"]},{"cell_type":"markdown","metadata":{"id":"DR0oiWk9t2NZ","colab_type":"text"},"source":["inception module here is a small module that performs loading the inception model as well as image preparation for the training."]},{"cell_type":"code","metadata":{"id":"OMRmTQ19t2Na","colab_type":"code","colab":{}},"source":["net, net_labels = inception.get_inception_model()"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"vUxqY0YQt2Nc","colab_type":"code","colab":{}},"source":["#get model graph definition and change it to use GPU\n","gd = net\n","\n","str_dg = gd.SerializeToString()\n","#uncomment next line to use GPU acceleration\n","str_dg = str_dg.replace(b'/cpu:0', b'/gpu:0') #a bit extreme approach, but works =)\n","gd = gd.FromString(str_dg)\n","\n","#gr_disp.show(gd)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"RAJ6Lsurt2Ne","colab_type":"text"},"source":["## 3. Create the graph"]},{"cell_type":"markdown","metadata":{"id":"joK6HMIPt2Nf","colab_type":"text"},"source":["This whole model won't fit in GPU memory. We will take only the part from input to the main output and copy it to a second graph, that we will use further."]},{"cell_type":"code","metadata":{"id":"KC-OjE53t2Ng","colab_type":"code","colab":{}},"source":["gd2 = tf.graph_util.extract_sub_graph(gd, ['output'])\n","g2 = tf.Graph() # full graph\n","with g2.as_default():\n","    tf.import_graph_def(gd2, name='inception')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"aP0tpLFKt2Ni","colab_type":"text"},"source":["One can see all operations defined in the graph:"]},{"cell_type":"code","metadata":{"scrolled":true,"id":"HYUm_trrt2Ni","colab_type":"code","colab":{}},"source":["gr_disp.show(g2.as_graph_def())"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"HTudmDd5t2Nk","colab_type":"code","colab":{}},"source":["#get names of all operation\n","names = [op.name for op in g2.get_operations()]\n","names"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"57AZ1Nzat2Nm","colab_type":"text"},"source":["## 4. Build own regressor on top"]},{"cell_type":"markdown","metadata":{"id":"BjX0oQVbt2Nn","colab_type":"text"},"source":["We will now create a fully connected regressor the same way as in previous session. The only difference is that instead of raw image data as input we will use 2048 image features that Inceprion is trained to detect. We will classify images in 2 classes."]},{"cell_type":"code","metadata":{"code_folding":[0],"id":"tgiwK--Ht2Nn","colab_type":"code","colab":{}},"source":["def fully_connected_layer(x, n_output, name=None, activation=None):\n","    \"\"\"Fully connected layer.\n","\n","    Parameters\n","    ----------\n","    x : tf.Tensor\n","        Input tensor to connect\n","    n_output : int\n","        Number of output neurons\n","    name : None, optional\n","        TF Scope to apply\n","    activation : None, optional\n","        Non-linear activation function\n","\n","    Returns\n","    -------\n","    h, W : tf.Tensor, tf.Tensor\n","        Output of the fully connected layer and the weight matrix\n","    \"\"\"\n","    if len(x.get_shape()) != 2:\n","        x = flatten(x, reuse=None)\n","\n","    n_input = x.get_shape().as_list()[1]\n","\n","    with tf.variable_scope(name or \"fc\", reuse=None):\n","        W = tf.get_variable(\n","            name='W',\n","            shape=[n_input, n_output],\n","            dtype=tf.float32,\n","            initializer=tf.initializers.he_uniform())\n","\n","        b = tf.get_variable(\n","            name='b',\n","            shape=[n_output],\n","            dtype=tf.float32,\n","            initializer=tf.initializers.constant(0.0))\n","\n","        h = tf.nn.bias_add(\n","            name='h',\n","            value=tf.matmul(x, W),\n","            bias=b)\n","\n","        if activation:\n","            h = activation(h)\n","\n","        return h, W"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"udLaA2T-t2Np","colab_type":"code","colab":{}},"source":["with g2.as_default():\n","    x = g2.get_tensor_by_name('inception/input:0')\n","    features = g2.get_tensor_by_name('inception/head0_bottleneck/reshape:0')\n","\n","    #placeholder for the true one-hot label\n","    Y = tf.placeholder(name='Y', dtype=tf.float32, shape=[None, 2])\n","        \n","    #one layer with 512 neurons with sigmoid activation and one with 2, softmax activation.\n","    L1, W1 = fully_connected_layer(features, 512, 'FC1', tf.nn.sigmoid)\n","    L2, W2 = fully_connected_layer(L1 , 2, 'FC2')\n","    Y_onehot = tf.nn.softmax(L2, name='Logits')\n","    Y_pred = tf.argmax(Y_onehot, axis=1, name='YPred')\n","    \n","    #cross-entropy used as a measure for qulity of each image.\n","    cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(logits=L2, labels=Y)\n","    \n","    #mean cross_entropy - for a set of images.\n","    loss = tf.reduce_mean(cross_entropy)\n","    optimizer = tf.train.AdamOptimizer(learning_rate=0.0001).minimize(loss)\n","    \n","    #Accuracy is defined as fraction of correctly recognized images.\n","    Y_true = tf.argmax(Y, 1)\n","    Correct = tf.equal(Y_true, Y_pred, name='CorrectY')\n","    Accuracy = tf.reduce_mean(tf.cast(Correct, dtype=tf.float32), name='Accuracy')"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"CNbOLvQYt2Nr","colab_type":"text"},"source":["## 5. Dataset"]},{"cell_type":"markdown","metadata":{"id":"ovWr8fDDt2Ns","colab_type":"text"},"source":["The Inception network is trained on natural images: thigs we see around everyday, like sky, flowers, animals, building, cars.\n","It builds an hierarchy of features, to describe what it sees. \n","This features can be used to train fast on different classes of objects. E.g. [here](https://www.tensorflow.org/tutorials/image_retraining) it can be retrained to distinguish flowers' species.\n","\n","Here you will see that these features can be even used to detect thngs very different from natural images. Namely we will try to use it to distinguish German text from Italian. We will use 100 samples, taken from 5 German and 5 Italian books, 10 samples each."]},{"cell_type":"code","metadata":{"id":"mc_89xtZt2Ns","colab_type":"code","colab":{}},"source":["text_label = ['German', 'Italian']"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"lOD1q75lt2Nu","colab_type":"code","colab":{}},"source":["labels0 = []\n","images0 = []\n","labels1 = []\n","images1 = []\n","\n","#German\n","for book in range(1,6):\n","    for sample in range(1,11):\n","        img = plt.imread('ML3/de/%d_%d.jpg'%(book, sample))\n","        assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)\n","        images0.append(inception.prepare_training_img(img))\n","        labels0.append([1,0])\n","for book in range(1,6):\n","    for sample in range(1,11):\n","        img = plt.imread('ML3/it/%d_%d.jpg'%(book, sample))\n","        assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)\n","        images1.append(inception.prepare_training_img(img))\n","        labels1.append([0,1])\n","        \n","idx = np.random.permutation(len(labels0))\n","labels0 = np.array(labels0)[idx]\n","images0 = np.array(images0)[idx]\n","labels1 = np.array(labels1)[idx]\n","images1 = np.array(images1)[idx]"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"A2Pf2tEpt2Nw","colab_type":"text"},"source":["Lets see a sample:"]},{"cell_type":"code","metadata":{"id":"DUu3dTUXt2Nw","colab_type":"code","colab":{}},"source":["_, axs = plt.subplots(1, 2, figsize=(10,10))\n","img_d = inception.training_img_to_display(images0[25])\n","axs[0].imshow(img_d)\n","axs[0].grid(False)\n","img_d = inception.training_img_to_display(images1[25])\n","axs[1].imshow(img_d)\n","axs[1].grid(False)\n","plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"apBoSWOOt2Ny","colab_type":"text"},"source":["The training is similar to what we did in the second session. The new thing here is that we will save the graph model and the graph trained parameters that we got after training.\n","\n","Since Inception model is big, this will take a while, even we use GPUs (one GTX 1080Ti / user). On your laptop CPU this would probably take ~15 times longer. And we are not training the whole Inception! We have just small thing on top + a very small dataset!"]},{"cell_type":"code","metadata":{"code_folding":[0],"scrolled":false,"id":"85HJPDcAt2Nz","colab_type":"code","colab":{}},"source":["#We will take 80% from each for training and 20 for validation\n","n_half = images0.shape[0]\n","n_train_half = n_half*80//100\n","n_train = n_train_half*2\n","\n","x_train = np.r_[images0[:n_train_half], images1[:n_train_half]]\n","y_train = np.r_[labels0[:n_train_half], labels1[:n_train_half]]\n","\n","x_valid = np.r_[images0[n_train_half:], images1[n_train_half:]]\n","y_valid = np.r_[labels0[n_train_half:], labels1[n_train_half:]]\n","\n","mini_batch_size = 10\n","\n","#directory where the model will be stored\n","try:\n","    os.mkdir('Seminar3_graph')\n","except:\n","    pass\n","\n","with tfSessionLimited(graph=g2) as sess:\n","    #initialize all the variables \n","    a_tr = []\n","    a_vld = []\n","    losses_t = []\n","    losses_v = []\n","\n","    #create saver\n","    saver = tf.train.Saver(tf.global_variables())\n","    sess.run(tf.global_variables_initializer())\n","    \n","    saver.export_meta_graph(os.path.join('Seminar3_graph', 'model.meta'))\n","\n","    for epoch in range (150):\n","        #shuffle the data and perform stochastic gradient descent by runing over all minibatches\n","        idx = np.random.permutation(n_train)\n","        for mb in range(n_train//mini_batch_size):\n","            sub_idx = idx[mini_batch_size*mb:mini_batch_size*(mb+1)]\n","            _, l = sess.run((optimizer, loss), feed_dict={x:x_train[sub_idx], Y:y_train[sub_idx]})\n","            l_v = sess.run(loss, feed_dict={x:x_valid, Y:y_valid})\n","            losses_t.append(np.mean(l))\n","            losses_v.append(np.mean(l_v))\n","\n","        #get accuracy on the training set and test set\n","        accuracy_train = sess.run(Accuracy, feed_dict={x:x_train, Y:y_train})\n","        accuracy_valid = sess.run(Accuracy, feed_dict={x:x_valid, Y:y_valid})\n","        \n","        #every 10th epoch print accuracies and current loss\n","        if epoch%10 == 0:\n","            print(accuracy_train, accuracy_valid, l)\n","\n","        a_tr.append(accuracy_train)\n","        a_vld.append(accuracy_valid)\n","    \n","    #save the graph state, checkpoint ch-0\n","    checkpoint_prefix = os.path.join('Seminar3_graph', 'ch')\n","    saver.save(sess, checkpoint_prefix, global_step=0, latest_filename='ch_last')\n","  \n","plt.plot(a_tr)\n","plt.plot(a_vld)\n","plt.legend(('training accuracy', 'validation accuracy'), loc='lower right')\n","plt.show()\n","\n","plt.plot(losses_t)\n","plt.plot(losses_v)\n","plt.legend(('training loss','validation loss'), loc='upper right')\n","plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Yr5x3x2at2N0","colab_type":"text"},"source":["We see that training accuracy hits 100% quickly. Why do you think it happens? Consider that loss keeps decreasing.\n","Also on such a small dataset our model overfits."]},{"cell_type":"markdown","metadata":{"id":"pwH_H7b6t2N1","colab_type":"text"},"source":["## 6. Load trained variables"]},{"cell_type":"markdown","metadata":{"id":"yrtttZKEt2N1","colab_type":"text"},"source":["If we have the model already created we can easily load the saved training variables valueas from a checkpoint:"]},{"cell_type":"code","metadata":{"id":"6uc8TW0vt2N2","colab_type":"code","colab":{}},"source":["with tfSessionLimited(graph=g2) as sess:\n","    #create saver and restore values\n","    saver = tf.train.Saver()\n","    saver.restore(sess, os.path.join('Seminar3_graph', 'ch-0'))\n","    \n","    #check that we still get proper performance oh a random image\n","    r1 = sess.run(Y_onehot, feed_dict={x:images1[:1]})\n","    \n","    print(r1)\n","    "],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Wh0y_Hr4t2N5","colab_type":"text"},"source":["## 7. Loading graph and variables. Saving constant subgraph."]},{"cell_type":"markdown","metadata":{"id":"CyvKyrK1t2N6","colab_type":"text"},"source":["Now, we don't want to define the whole model the same way we created it every time to use it. Might be you did it before a Friday apero, and then changed something.... And on Moday ... there is no way to remember! \n","<img src=\"https://scits-training.unibe.ch/data/figures/pipeline.png\" alt=\"drawing\" width=\"85%\"/><br>\n","\n","\n","To restore model we will load the metagraph:"]},{"cell_type":"code","metadata":{"id":"8lrnsxAdt2N8","colab_type":"code","colab":{}},"source":["def get_saved_graph(path):\n","    g = tf.Graph()\n","    with g.as_default():\n","        saver = tf.train.import_meta_graph(path)\n","        return g"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"A0WousIGt2N_","colab_type":"text"},"source":["Then, lets restore it to a new graph:"]},{"cell_type":"code","metadata":{"id":"Y_eanv88t2OA","colab_type":"code","colab":{}},"source":["g3 = get_saved_graph(os.path.join('Seminar3_graph', 'model.meta'))"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"aWjMK9gpt2OF","colab_type":"code","colab":{}},"source":["gr_disp.show(g3.as_graph_def())"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"WzzJZSpMt2OH","colab_type":"code","colab":{}},"source":["with tfSessionLimited(graph=g3) as sess:\n","    #create saver and restore values\n","    saver = tf.train.Saver()\n","    saver.restore(sess, os.path.join('Seminar3_graph', 'ch-0'))\n","    \n","    #check that we still get proper performance oh a random image\n","    x3 = g3.get_tensor_by_name('inception/input:0')\n","    yhot3 = g3.get_tensor_by_name('Logits:0')\n","    r1 = sess.run(yhot3, feed_dict={x3:images1[:1]})\n","    \n","    print(r1)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"yGmqAQxOt2OI","colab_type":"text"},"source":["Once the model is trained you want to use it for inference. For this you convert all the variables to canstants, and obtain the `GraphDef`"]},{"cell_type":"code","metadata":{"id":"2z-Oy36ct2OJ","colab_type":"code","colab":{}},"source":["g3 = get_saved_graph(os.path.join('Seminar3_graph', 'model.meta'))\n","\n","dst_nodes = ['Logits', 'YPred']\n","\n","with tfSessionLimited(graph=g3) as sess:\n","    # restore variables\n","    saver = tf.train.Saver(tf.global_variables())\n","    saver.restore(sess, os.path.join('Seminar3_graph', 'ch-0'))\n","        \n","    # Now lets convert trainable parameters to constants for the \n","    # inference use (dst_nodes is the list of final operations.\n","    # Everything on what they depend will be conveted as well)\n","    graph_def = tf.graph_util.convert_variables_to_constants(\n","        sess, g3.as_graph_def(add_shapes=True), dst_nodes)\n","    "],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"hU1gqULlt2OK","colab_type":"text"},"source":["Finally, we create a graph where we copy only everything needed to compute the `dst_nodes`, and export is as `const_graph.pb`"]},{"cell_type":"code","metadata":{"id":"zKSZz5Jnt2OK","colab_type":"code","colab":{}},"source":["with tf.Graph().as_default():\n","    #extract everything on what Logits and YPred depend\n","    sub_graph = tf.graph_util.extract_sub_graph(graph_def, dst_nodes)\n","    \n","    #save in a protobuf\n","    tf.train.write_graph(sub_graph, 'Seminar3_graph/', 'const_graph.pb', as_text=False)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"6Rc9mTd-t2OM","colab_type":"text"},"source":["## 8. Loading constant graph"]},{"cell_type":"markdown","metadata":{"id":"fjRcKhwet2OM","colab_type":"text"},"source":["Now you can take the `const_graph.pb`, and use for language detection elsewhere. There is trainable parameters in it: they are all converted to constants. This is what you deploy on production. You can use it in c++ version of TF.\n","\n","Lets again check it. We will create one more graph, and read only this file in it:"]},{"cell_type":"code","metadata":{"id":"cJIgyaB7t2ON","colab_type":"code","colab":{}},"source":["g5 = tf.Graph()\n","with g5.as_default():\n","    #read protobuf file to a graph definition\n","    with tf.gfile.GFile(\"Seminar3_graph/const_graph.pb\",'rb') as f:\n","        graph_def = tf.GraphDef()\n","        graph_def.ParseFromString(f.read())\n","    \n","    #import graphdef into current graph (g5)\n","    tf.import_graph_def(graph_def, name='')\n","    \n","    #display it. Looks sooo clean now!\n","    gr_disp.show(graph_def)"],"execution_count":0,"outputs":[]},{"cell_type":"code","metadata":{"id":"nu7AI5U4t2OP","colab_type":"code","colab":{}},"source":["#check if it works    \n","with tfSessionLimited(graph=g5) as sess:\n","    # get input and output tensors, and run it for one image:\n","    x5 = g5.get_tensor_by_name(\"inception/input:0\")\n","    y5 = g5.get_tensor_by_name(\"Logits:0\")\n","    p5 = g5.get_tensor_by_name(\"YPred:0\")\n","    r5,rp5 = sess.run([y5,p5], feed_dict={x5:images1[:1]})\n","    \n","    print(r5, rp5)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"2K71p8F0t2OS","colab_type":"text"},"source":["## 9. Improving the results"]},{"cell_type":"markdown","metadata":{"id":"iwVcKUelt2OU","colab_type":"text"},"source":["Often, as in this sample we don't have anough labeled data in hand. We need to use it as efficient as possible.\n","One way to do it is to aply training data augmentation: we can slightly distort it, e.g. rescale, to effectively multiply the dataset."]},{"cell_type":"markdown","metadata":{"id":"zoER6gFgt2OV","colab_type":"text"},"source":["We will generate rescaled images, minimum - to have smaller dimension equal 256, maximum - 130%. Let's define a function which will do this job:"]},{"cell_type":"code","metadata":{"id":"lBX83LXot2OV","colab_type":"code","colab":{}},"source":["def get_random_scaled_img(file, minsize = 256, scalemax=1.3):\n","    im = Image.open(file)\n","    w, h = im.size\n","    # get minimal possible size\n","    scalemin =float(minsize) / min(w,h)\n","    # get a rescale factor from a uniform distribution.\n","    scale = scalemin + np.random.rand() * (scalemax - scalemin)\n","    w1 = int(max(minsize, scale*w))\n","    h1 = int(max(minsize, scale*h))\n","    \n","    #rescale with smoothing\n","    im1 = im.resize((w1,h1), Image.ANTIALIAS)\n","    #get numpy array from the PIL Image\n","    img_arr = np.array(im1.convert('RGB'))\n","\n","    #crop to 256x256, preventing further resize by prepare_training_img\n","    r = (img_arr.shape[0] - minsize) // 2\n","    c = (img_arr.shape[1] - minsize) // 2\n","    img_arr = img_arr[r:r+minsize,c:c+minsize]\n","\n","    return img_arr"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"GL8Qt87it2OX","colab_type":"text"},"source":["Lets check rescaled images."]},{"cell_type":"code","metadata":{"id":"0C4aL8x_t2OX","colab_type":"code","colab":{}},"source":["n_smpl=2\n","scaled_imgs=[get_random_scaled_img('ML3/de/%d_%d.jpg'%(1, 1)) for i in range(n_smpl**2)]\n","fig, ax = plt.subplots(n_smpl, n_smpl, figsize=(n_smpl*4, n_smpl*4))\n","for row in range(n_smpl):\n","    for col in range(n_smpl):\n","        ax[col, row].imshow(scaled_imgs[row*n_smpl+col])\n","        ax[col, row].grid(False)"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"kisfRq3vt2OZ","colab_type":"text"},"source":["Read again images, now generating 5 rescaled from each one."]},{"cell_type":"code","metadata":{"id":"o3DtE8hQt2OZ","colab_type":"code","colab":{}},"source":["labels0 = []\n","images0 = []\n","labels1 = []\n","images1 = []\n","\n","mult = 5\n","#German\n","for book in range(1,6):\n","    for sample in range(1,11):\n","        for itr in range(mult):\n","            img = get_random_scaled_img('ML3/de/%d_%d.jpg'%(book, sample))\n","            assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)\n","            images0.append(inception.prepare_training_img(img))\n","            labels0.append([1,0])\n","#Italian\n","for book in range(1,6):\n","    for sample in range(1,11):\n","        for itr in range(mult):\n","            img = get_random_scaled_img('ML3/it/%d_%d.jpg'%(book, sample))\n","            assert(img.shape[0]>=256 and img.shape[1]>=256 and len(img.shape)==3)\n","            images1.append(inception.prepare_training_img(img))\n","            labels1.append([0,1])\n","        \n","idx = np.random.permutation(len(labels0))\n","labels0 = np.array(labels0)[idx]\n","images0 = np.array(images0)[idx]\n","labels1 = np.array(labels1)[idx]\n","images1 = np.array(images1)[idx]"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"ZeDPdwjHt2Ob","colab_type":"text"},"source":["And finally do training again, same way. Just now we change the number of epochs: before we had 150, but now that we have 5 times more training data we'll do 60. While 60 > 150/5, it looks like it takes a bit more time to converge.\n","We use the same graph as before, `g2`, the one we can train."]},{"cell_type":"code","metadata":{"id":"XPTso1cft2Ob","colab_type":"code","colab":{}},"source":["n_half = images0.shape[0]\n","n_train_half = n_half*80//100\n","n_train = n_train_half*2\n","\n","x_train = np.r_[images0[:n_train_half], images1[:n_train_half]]\n","y_train = np.r_[labels0[:n_train_half], labels1[:n_train_half]]\n","\n","x_valid = np.r_[images0[n_train_half:], images1[n_train_half:]]\n","y_valid = np.r_[labels0[n_train_half:], labels1[n_train_half:]]\n","\n","a_tr = []\n","a_vld = []\n","mini_batch_size = 10\n","\n","\n","with tfSessionLimited(graph=g2) as sess:\n","    #initialize all the variables \n","    a_tr = []\n","    a_vld = []\n","    losses_t = []\n","    losses_v = []\n","\n","    saver = tf.train.Saver()\n","    sess.run(tf.global_variables_initializer())\n","\n","    for epoch in range (60):\n","        #shuffle the data and perform stochastic gradient descent by runing over all minibatches\n","        idx = np.random.permutation(n_train)\n","        for mb in range(n_train//mini_batch_size):\n","            sub_idx = idx[mini_batch_size*mb:mini_batch_size*(mb+1)]\n","            _, l = sess.run((optimizer, loss), feed_dict={x:x_train[sub_idx], Y:y_train[sub_idx]})\n","            l_v = sess.run(loss, feed_dict={x:x_valid, Y:y_valid})\n","            losses_t.append(np.mean(l))\n","            losses_v.append(np.mean(l_v))\n","\n","        accuracy_train = sess.run(Accuracy, feed_dict={x:x_train, Y:y_train})\n","        accuracy_valid = sess.run(Accuracy, feed_dict={x:x_valid, Y:y_valid})\n","        if epoch%5 == 0:\n","            print(accuracy_train, accuracy_valid)\n","        a_tr.append(accuracy_train)\n","        a_vld.append(accuracy_valid)\n","        \n","    #save the graph state, checkpoint ch-1\n","    checkpoint_prefix = os.path.join('Seminar3_graph', 'ch')\n","    saver.save(sess, checkpoint_prefix, global_step=1, latest_filename='ch_last')\n","    \n","plt.plot(a_tr)\n","plt.plot(a_vld)\n","plt.legend(('training accuracy', 'validation accuracy'), loc='lower right')\n","plt.show()\n","\n","plt.plot(losses_t)\n","plt.plot(losses_v)\n","plt.legend(('training loss','validation loss'), loc='upper right')\n","plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"XHZx8jQot2Oc","colab_type":"text"},"source":["We had a REEEALLY small dataset for such a complicated task. Does it really generalize? mb it just memorizes all the images we fed into it? Lets perform a test. `w1.PNG` and `w2.PNG` are text screenshots from wikipedia in [Italian](https://it.wikipedia.org/wiki/Apprendimento_automatico) and [German](https://de.wikipedia.org/wiki/Maschinelles_Lernen)."]},{"cell_type":"code","metadata":{"id":"XGbAqHCWt2Od","colab_type":"code","colab":{}},"source":["# load images\n","im_wiki_1 = plt.imread('ML3/w1.jpg')\n","im_wiki_2 = plt.imread('ML3/w2.jpg')\n","\n","# crop/covert for proper color range\n","im_wiki_1_p = inception.prepare_training_img(im_wiki_1)[np.newaxis]\n","im_wiki_2_p = inception.prepare_training_img(im_wiki_2)[np.newaxis]\n","\n","with tfSessionLimited(graph=g2) as sess:\n","    saver = tf.train.Saver()\n","    # load checkpoint after the last training\n","    saver.restore(sess, os.path.join('Seminar3_graph', 'ch-1'))\n","        \n","    # get predictions\n","    pred1 = sess.run(Y_onehot, feed_dict={x:im_wiki_1_p})\n","    pred2 = sess.run(Y_onehot, feed_dict={x:im_wiki_2_p})\n","\n","    # will it be ok???\n","    print('probabilities for w1:', pred1, 'detected language:', text_label[np.argmax(pred1)])\n","    print('probabilities for w2:', pred2, 'detected language:', text_label[np.argmax(pred2)])\n","\n","    # Show image crops\n","    plt.imshow( inception.training_img_to_display(im_wiki_1_p[0]))\n","    plt.show()\n","    plt.imshow( inception.training_img_to_display(im_wiki_2_p[0]))\n","    plt.show()"],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"gv8FlrBjt2Of","colab_type":"text"},"source":["As you see it works. Probabilities are close to 100%, meaning the net is confident that it's theese languages, it's not just a random fluctuation around 50% margin."]},{"cell_type":"markdown","metadata":{"id":"MN0LvX7dt2Of","colab_type":"text"},"source":["## 12. Excercise 1"]},{"cell_type":"markdown","metadata":{"id":"HGzj4VPFt2Og","colab_type":"text"},"source":["There is a serious problem in the example above: the training and validation datasets are not independent. We generated 5 randomly scaled images from each initial image. With high probability from 5 images (generated from same initial one!) some will end up im the training and some in validation datasets. Since they are generated from the same initial ones, they are not fully independent. This compromises evaluation of model performance, leading to an overestimate of the performance.\n","\n","1. Modify the generation of the training and validation datasets to fulfil requirenment of independance.\n","2. Check how validation accuracy and loss changes"]},{"cell_type":"markdown","metadata":{"id":"Bj0ra0Cpt2Og","colab_type":"text"},"source":["## 13. Excercise 2"]},{"cell_type":"markdown","metadata":{"id":"PSMxY4ezt2Oh","colab_type":"text"},"source":["(Hope we have time left....)\n","Test the performance of model trained on NOT rescaled images, on the wiki screenshots."]},{"cell_type":"code","metadata":{"id":"CoFMTLTQt2Oi","colab_type":"code","colab":{}},"source":["#copy the above code here\n","#load the checkpoint ch-0 instead of ch-1\n","\n","\n","...."],"execution_count":0,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"1u0_qadxt2Ok","colab_type":"text"},"source":["## 14. Homework (3 options)"]},{"cell_type":"markdown","metadata":{"id":"q5x83mdht2Ol","colab_type":"text"},"source":["### 14.1 Improve training set"]},{"cell_type":"markdown","metadata":{"id":"-SetU7UKt2Ol","colab_type":"text"},"source":["So far we scaled images as a whole. \n","- Try to scale differently in $x$ and $y$ direction.\n","- Check how it affects performace.\n","- Which else transformation would make sence for the text data?\n","- Get hands dirty."]},{"cell_type":"markdown","metadata":{"id":"FxfnvGort2Om","colab_type":"text"},"source":["### 14.2 Try to use lower layers' outputs from Inception to build the classifier."]},{"cell_type":"markdown","metadata":{"id":"Gn0H865Xt2On","colab_type":"text"},"source":["So far we used last output of Inception.\n","- Look at the Inception more carefully.\n","- Inspect the size of the data array at different layers.\n","- Since inside you have 3D data (2D image * features at each position) you will need to flatten it. Look how this is done in last layers (`head0`). Alternatively you can create convolutional layers.\n","- Ask, google it, and get your hands dirty!"]},{"cell_type":"markdown","metadata":{"id":"lwcVPGjtt2Oo","colab_type":"text"},"source":["### 14.3 Classify 3 languages."]},{"cell_type":"markdown","metadata":{"id":"qQl062yyt2Oo","colab_type":"text"},"source":["So far we tried two languages.\n","- Create 50 crops of text in another language (better use 5 sources with different fonts, otherwise you risk to learn font, not language), images size > 300 x 300 (to allow scaling).\n","- Upload them to the `ML3` directory inside of a new directory `xx`.\n","- Repeat everything with 3 classes.\n","- Think of the case when this approach won't work.\n","- Get hands dirty!!!"]}]}